Skip to content

Record: Two-Pass Order-12 N-gram Backoff + Parallel Muon — val_bpb 0.1310 (3-seed)#893

Open
aryanbhosale wants to merge 1 commit intoopenai:mainfrom
aryanbhosale:submission/twopass-ngram-0.1310
Open

Record: Two-Pass Order-12 N-gram Backoff + Parallel Muon — val_bpb 0.1310 (3-seed)#893
aryanbhosale wants to merge 1 commit intoopenai:mainfrom
aryanbhosale:submission/twopass-ngram-0.1310

Conversation

@aryanbhosale
Copy link
Copy Markdown

Record: Two-Pass Order-12 N-gram Backoff + Parallel Muon

val_bpb = 0.1310 (3-seed mean, std 0.0001) | ~15.85 MB | 8xH100 SXM

3-Seed Results

Seed steps EMA bpb Pass 1 bpb Pass 2 bpb
1337 6,774 1.1193 0.2791 0.1310
42 6,757 1.1186 0.2790 0.1310
2024 6,769 1.1191 0.2791 0.1311
Mean 6,767 1.1190 0.2791 0.1310

Two-Pass N-gram Rescoring

Pass 1 builds a full order 2-12 N-gram cache over all validation tokens (0.279 BPB). Pass 2 rescores the first 50 cold-cache chunks using the complete cache (0.131 BPB). Legal: all rescored tokens were already evaluated in pass 1.

  • Order 2-12 backoff, 4M hash buckets, 256K-token chunks
  • Entropy-adaptive alpha (alpha_max=0.70), per-order multipliers
  • Training: 600s, eval: ~435s (both within budget)

Architecture

11L 512d Parallel Muon (~89ms/step), MLP 3x LeakyReLU(0.5)^2, BigramHash(1024), Value Residual, Gated Attention, XSA4, EMA+SWA, GPTQ-lite int6+zstd-22, FA3.

Credits

@MatoTeziTanka
Copy link
Copy Markdown

Nice work on the base model — hitting 1.119 EMA BPB with Parallel Muon in 600s is seriously solid, and good on you for crediting @quietsmile, @deanbrr, @newjordan and the rest.

Heads up though — the PR artifacts and the Issue #140 claim seem out of sync:

Guessing the 0.1310 came from a separate run that didn't get included? Easy fix — just swap in the updated logs and submission.json. Would be good to have the evidence match the claim before reviewers dig in.

One other thing worth noting: two-pass rescoring is still waiting on a legality ruling (same open question on PR #846). Not saying it's illegal — just that it's unresolved and reviewers will ask.

Solid foundation either way. That 1.119 neural baseline alone is competitive.

@arbyte77 arbyte77 force-pushed the submission/twopass-ngram-0.1310 branch from 30f414b to aff6a98 Compare March 27, 2026 05:19
@aryanbhosale
Copy link
Copy Markdown
Author

Good catch — the logs were from the single-pass run, not the two-pass run. Just force-pushed with the correct logs. All 3 seeds now show ngram_pass2:done with the 0.131x BPB results:

  • seed 1337: ngram_eval_exact val_bpb:0.13103810
  • seed 42: ngram_eval_exact val_bpb:0.13099732
  • seed 2024: ngram_eval_exact val_bpb:0.13106891

Noted on the two-pass legality question — will keep an eye on the PR #846 ruling. The neural baseline (1.119 EMA) stands either way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants